A Non-parametric Maximum Entropy Clustering
نویسندگان
چکیده
Clustering is a fundamental tool for exploratory data analysis. Information theoretic clustering is based on the optimization of information theoretic quantities such as entropy and mutual information. Recently, since these quantities can be estimated in non-parametric manner, non-parametric information theoretic clustering gains much attention. Assuming the dataset is sampled from a certain cluster, and assigning different sampling weights depending on the clusters, the cluster conditional information theoretic quantities are estimated. In this paper, a simple clustering algorithm is proposed based on the principle of maximum entropy. The algorithm is experimentally shown to be comparable to or outperform conventional non-parametric clustering methods.
منابع مشابه
Maximum Probability and Relative Entropy Maximization. Bayesian Maximum Probability and Empirical Likelihood
Works, briefly surveyed here, are concerned with two basic methods: Maximum Probability and Bayesian Maximum Probability; as well as with their asymptotic instances: Relative Entropy Maximization and Maximum Non-parametric Likelihood. Parametric and empirical extensions of the latter methods – Empirical Maximum Maximum Entropy and Empirical Likelihood – are also mentioned. The methods are viewe...
متن کاملEmpirical Maximum Entropy Methods
A method, which we suggest to call the Empirical Maximum Entropy method, is implicitly present at Maximum Entropy Empirical Likelihood method, as its special, non-parametric case. From this vantage point the entropy-based empirical approach to estimation is surveyed.
متن کاملMost Likely Maximum Entropy for Population Analysis with Region-Censored Data
The paper proposes a new non-parametric density estimator from region-censored observations with application in the context of population studies, where standard maximum likelihood is affected by over-fitting and non-uniqueness problems. It is a maximum entropy estimator that satisfies a set of constraints imposing a close fit to the empirical distributions associated with the set of censoring ...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملDetermination of height of urban buildings based on non-parametric estimation of signal spectrum in SAR data tomography
Nowadays, the TomoSAR technique has been able to overcome the limitations of radar interferometry techniques in separating multiple scatterers of pixels. By extending the principles of virtual aperture in the elevation direction, these techniques pay much attention in the analysis of urban challenging areas. Despite the expectation of interference of the distribution of buildings with different...
متن کامل